ComfyUI で circlestone-labs の Anima を使う

Anima は軽量で NSFW の絡みが出せるのが強みだ。Anima の隠れた利点は、ダンボールタグを知らなくても、自然言語で記述したものがダンボールタグにあれば再現できることだ。

しかしテキストエンコーダーが 0.6B しかない（現在の軽量モデルは 4B を使うのが主流）ので細かい指示はできない。たとえば、コマの位置の指定ができなかったり、ダンボールタグにないポーズは自然言語でポーズを指定できなかったり、タグの伝染が起こったりする。

ただしテキストエンコーダーの能力が低くても、個数指定や左右指定ぐらいはできる。例えば thigh strap の個数や左右位置を指定できたり、左右非対称の衣装を固定したりできる。

Anima はダンボールタグにあるポーズしか出せないし、ダンボールタグにあるオブジェクトしか描けない。Z-Image や FLUX.2 klein は自然言語でポーズを指示でき、多様なオブジェクトを知っているが、Anima はテキストエンコーダーもモデルも貧弱かつ、データセットが偏っているので汎用性がない。

なので Anima だけですべての作業をするのではなく、ダンボールタグ外の要素は FLUX.2 klein や Qwen Image Edit のような編集モデルを使う割り切りが必要。

アーティストタグを使うなら以下のワークフローはすでに実用レベル：

Anima で下絵
アップスケール
Illustrious の派生モデルで i2i （ディティールの追加・シャープネスアップ・画風の固定）
SAM2 や SAM3 でセグメンテーション（なくてもいい）
Illustrious の派生モデルで Detailer

欠点

文字が描けない
背景の品質がよくない（Z-Image Turbo や FLUX.2 klein と比較して）
タグの伝染が起こるので複数キャラを出す場合は、プロンプトが長くなる
漫画のコマの位置の指定などはできない
ダンボールタグに無いものは描けない
自然言語でダンボールタグにないポーズを指示できない

モデル

VAE は Qwen Image (Edit) と同じ。

配置場所	URL
models/unet	anima-preview.safetensors
models/text_encoders	qwen_3_06b_base.safetensors
models/vae	qwen_image_vae.safetensors

CFG 蒸留 LoRa

RDBT - Anima

CFG = １で高品質に生成できるようにする LoRa。約２倍高速化するが、ネガティブプロンプトが使えなくなるので制御が難しくなる。２段サンプラーの２段目に使う方法がある。

ワークフロー

example.png を ComfyUI にドラッグ。

設定

Generation settings

解像度 1MP
30-50 steps
4-5 CFG
サンプラー

er_sde: ニュートラルスタイル、フラットカラー、シャープなライン。デフォルトの推奨サンプラー
euler_a: ソフトで線が細い。2.5D に向いている。CFG を高くしても彩度が高くなりづらい
dpmpp_2m_sde_gpu: er_sde に似ているが絵のばらつきが大きい。

er_sde

Elucidating the solution space of extended reverse-time SDE for diffusion models。

er_sde は、速いが品質は高くない ODE と高品質だがステップ数が必要な SDE とをブレンドしたサンプラー。ステップ数に応じて最適なブレンド比率を計算する。

解像度

以下の解像度で生成可能。1536 を超えると胴が伸びるので２段サンプラーが必要になる。

1536 x 1536
1600 x 1088
1856 x 1024

ModelSamplingAuraFlow

公式ワークフローには ModelSamplingAuraFlow ノードはない。ModelSamplingAuraFlow ノードがなくても、デフォルトでタイムステップシフト３が適用されている。

６にすると背景のディティールが増える

プロンプト

画力を上げる

アーティストタグを使う。アーティスト名の前に半角の @ をつける（@アーティスト名）。ネガティブプロンプトに画力の低いアーティストを指定すると、とても安定する。厚塗り系が欲しいなら、アニメ塗りのアーティストをネガに入れる。

アーティスト名はポジティブのみ

ポジティブとネガティブ両方に異なるアーティスト名

タグの記述順

[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]

クオリティタグはマスピ系列と Pony V7 系列と両方機能する。Pony V7 はアンダースコアが必要なことに注意。クオリティタグは絵柄にも作用し、いわゆる AI っぽい絵になりやすい。

セーフティータグとは別に、ダンボールタグに censored と uncensored がある。

カテゴリ	タグ例
クオリティタグ	masterpiece, best quality, good quality, normal quality, low quality, worst quality PonyV7 aesthetic model based: score_9, score_8, ..., score_1
年代タグ	year 2025, year 2024, ... newest, recent, mid, early, old
メタタグ	highres, absurdres, anime screenshot, jpeg artifacts, official art, etc
セーフティータグ	safe, sensitive, nsfw, explicit
アーティストタグ	@アーティスト名

プロンプト例

year 2025, newest, normal quality, score_5, highres, safe, 1girl, oomuro sakurako, yuru yuri, @nnn yryr, smile, brown hair, hat, solo, fur-trimmed gloves, open mouth, long hair, gift box, fang, skirt, red gloves, blunt bangs, gloves, one eye closed, shirt, brown eyes, santa costume, red hat, skin fang, twitter username, white background, holding bag, fur trim, simple background, brown skirt, bag, gift bag, looking at viewer, santa hat, ;d, red shirt, box, gift, fur-trimmed headwear, holding, red capelet, holding box, capelet

クオリティタグ

クオリティタグがあるとアーティストタグを使った画風の固定ができない（ばらつく）。アーティストタグを使うならクオリティタグはなくても品質に問題はない。

逆に画風にある程度のガチャ感が欲しい場合はクオリティタグを入れるのが手軽。

ModelSamplingAuraFlow の shift:６

@アーティストタグ, (thick lineart:0.5), highres, explicit.

A girl with blue eyes, eyelashes, medium breasts, breasts apart wearing a red shiny highleg string bikini is head tilt, grin and standing in contrapposto with v sign.

In the background, there is a beach with a sea and a palm tree.


ネガティブプロンプト

@アーティストタグ, 3d, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, watermark, logo, text, flat color, shaded face, blacklighting

9:16 の高解像度は胴が伸びるので２段サンプラー（両方 Anima）で生成。

Illustrious のマスピ系クオリティタグ

レーティングタグ	パーセンテージ
worst quality	~8%
bad quality	~20%
average quality	~60%
good quality	~82%
best quality	~92%
masterpiece	~100%

Pony V7 のクオリティタグ

スコア（お気に入り数や投票数）を 10% 刻みで階層化したもの。

自然言語

Natural language prompting tips

全て自然言語で記述する場合は、２文以上必要。短すぎるとまともな画像にならない。

自然言語とタグを混ぜられる。

キャラを複数指定する場合に、必ず守らなければならないルール

以下のルールを守らない場合、キャラの特徴が混ざる。キャラを複数指定する場合は各キャラの特徴も記述する必要がある。

キャラ名の後にキャラの属性を記述する
キャラごとに、キャラの特徴も記述する。キャラ名だけを列挙してはならない

"Digital artwork of Fern from Sousou no Frieren, with long purple hair and purple eyes, wearing a black coat over a white dress with puffy sleeves..."

タグの伝染

複数人いるうちの一人のみ服装を指定すると、他の人もその服を着る。これは表情やポーズでも同じ。何らかの指定（服・表情・ポーズ・位置）をすると、その属性は全員分指定しないとタグが伝染して意図しない結果になる。

closed eyes はその対義語の open eyes がないので問題になる。brown eyes のように目の色を指定すると目を開けやすくなる。

cutout も伝染しやすい。navel cutout と breast cutout は片方指定するともう片方が頻出する。２人いて片方 navel cutout、もう片方 breast cutout の指示はほぼ守られない。

マニキュアの nail polish はマニキュアをしていないタグが存在しないため、男もマニキュアをする。

強調構文

comfy/text_encoders/anima.py の AnimaTokenizer.tokenize_with_weight() で Qwen 3 0.6b ウェイトを 1.0 にリセットする処理をしているが、t5xxl のトークナイザーは機能しているので (タグ:2) のような強調構文も機能する。

Anima は Qwen3 0.6b と t5xxl と、２つのトークナイザでプロンプトをトークン化¹した後、両方を Qwen 3 0.6b でエンコードして２つの embeddings を作成²する。それらを結合して DiT に入力する³。

枠・帯

黒い枠が出る場合は、highres や absurdres をプロンプトに入れる。border, pillarboxed, letterboxed をネガに入れても効果は薄い。

面積制御

詳細に書いたものが大きく描画される。背景の内容を詳しく書くと、キャラが小さく描画される。複数人いる時に特定のキャラだけを詳しく書くと、他のキャラが小さく描かれたり、遠くに描かれたり、クロップされたりする。

顔の影対策

shaded face, blacklighting をネガに入れる。

テキストレンダリング

アルファベットはそこそこ描ける。

english text, engrish text, chinese text, korean text をネガに入れると、sound effects が日本語になりやすい。

tips

大きさの指定は効いたり効かなかったりする。見えるものを書くのが基本。

悪い例：small navel cutout や large navel cutout
良い例：navel cutout, Her abdomen is mostly exposed.

指が溶ける

ステップ数を増やす
解像度を上げる
ModelSamplingAuraFlow ノードで６以上の shift 値を入れる

画風

クオリティタグ（masterpiece や score_9 など）は画風に影響を与える。以下のようなケースではクオリティタグを外す。

アーティストの画風を固定したい場合
アニメにしたい場合（anime screenshot を入れる）
公式イラストに似せたい場合（official art もしくは game cg を入れる）

画風制御に便利なタグ

クオリティタグ
リアル

3d
realistic

線画系

lineart
no lineart
thick lineart

ラフ

sketch
oekaki
partially colored

画材

painterly （厚塗り）
watercolor (medium)
flat color

色味

pastel colors
high contrast
sepia
warm colored
cool colored

アニメ

anime coloring
anime screenshot

ゲーム

game cg
game screenshot

描きこみ

absurdly detailed composition
complex exterior
loaded interior
messy room
too many
scenery

推論速度

環境

Windows11 25H2
RTX3050
RAM 32 GB
python 3.12.9
torch 2.9.1+cu128
triton_windows-3.5.1.post23
sageattention-2.2.0+cu128torch2.9.0.post4
CFG５

SageAttention なし

解像度	推論速度 (s/it)	30 step (秒)
1024 x 1024	2.5	80
1024 x 1408	3.7	110

SageAttension あり

解像度	推論速度 (s/it)	30 step (秒)
1024 x 1024	2.32	70
1024 x 1408	3.30	100

作例

設定は：

30 steps
cfg５
sampler: er_sde
scheduler: simple

masterpiece, best quality, @アーティストタグ.

There are three girls in a room.

The girl on the left has short red hair and blue eyes. She is sitting on a stool. She is wearing a pink camisole and gray dolphin shorts.

The girl in the middle has long silver hair and red eyes. She is standing. She is wearing a white collared shirt and a black pencil skirt.

The girl on the right has medium brown hair and green eyes. She is sitting on a stool. She is wearing a beige sweater and a blue denim.

There is a potted plant, a frying pan on the kitchen wall in the background.


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

@アーティストタグ, masterpiece, best quality

# play

uncensored, vaginal, sex from behind, rough sex, from side

# girl

on the left side, skinny, black long hair, eyelashes, red eyes, open mouth, medium breasts, choker, blue cotton panties, panties aside, pigeon-toed, barefoot

# boy

on the right side, grabbing another's ass

# effect

cum in pussy, sound effects, motion lines, sweat

# background 

indoors, carpet, table, door, painting (object)


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

タグだけでは出しにくいポーズ指定ができる。

@アーティストタグ, masterpiece, best quality, highres, 2girls, explicit, futa with female, indoors, bed sheet, pillow, window, curtains, yuri, vaginal sex, cum in pussy. 

A girl with brown long hair is naked on the left, perky breasts, puffy nipples, lying, split, spooning, her left leg up vertically and her right leg is straight, embarrassed.

A futanari girl with black long hair, perky breasts, puffy nipples, nail polish, grin, is naked on the right. She is looking down. barefoot, wariza. She is holding the brown haired girl's leg, futanari. She is straddling on the brown haired girl's right leg.


ネガティブプロンプト

@アーティストタグ, 3d, white background, bar censor, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, watermark, logo, motion lines, text, flat color, shaded face,
blacklighting

faceless fat bald ugly bastard がまったく伝染しないためそれだけでも使う価値がある。v sign は伝染する。looking to the left side は効かない。

画力の高いアーティストは ugly bastard を醜く描けない傾向にある。

@アーティストタグ, highres, explicit, 1boy, 1girl, indoors, hotel room.

A faceless fat bald ugly bastard with grin wearing t-shirt is standing on the right side. He grabbed the girl's breast over shoulder and v sign with his right hand.

A embarrassed girl with perky breasts, puffy nipples is standing and looking to the left side. arm around shoulder. Her arms at side.


ネガティブプロンプト

@アーティストタグ, upper teeth only, 3d, sketch, flat color, bar censor, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, watermark, logo, text, shaded face, blacklighting

@アーティストタグ, masterpiece, best quality, indoors.

There are four balls on a table. 

A left top ball is red. A blue right top ball is twice the size of the red ball. A yellow left bottom ball is twice the size of the blue ball. A green right bottom ball is twice the size of the yellow ball.


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

色・位置・個数には忠実だが、大きさは何度やっても指示通りにならない。

@アーティストタグ, masterpiece, best quality, 2girls, outdoors.

A girl with brown hair wearing a school uniform is standing.

Another girl with red hair wearing a t-shirt and a skirt is standing behind the girl far away.

奥行の指定はできる。

Anima

@アーティストタグ, masterpiece, best quality, 1girl, indoors, carpet.

A girl with long hair wearing a school swimsuit is standing on the floor. 

# arms

Her left arm is straight up, and her right arm is straight out to the side. 

# legs

She is stepping on the sofa seat with her right foot and her left foot is on the floor. 


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

FLUX.2 klein 9b

Anime style.

A girl with long hair wearing a competition swimsuit is standing on the floor. 

# arms

Her left arm is straight up, and her right arm is straight out to the side. 

# legs

She is stepping on the sofa seat with her right foot and her left foot is on the floor. 

# room
There is a carpet.

何度やってもプロンプトのポーズができない。なおダンボールタグのソファは couch。

数十回やり直した。キャラタグ３人以上は描き分けの成功率がとても低い。

@アーティストタグ, masterpiece, best quality, 3girls, 

hatsune miku is wearing a grey shirt with bare shoulders. She is lying on back.

kagamine rin with short blonde hair wearing a sailor shirt. girl on top. She is straddling on hatsune miku.

megurine luka with pink long hair with crossed arms. She is standing behind them. She is looking at the girls.

# background 

indoors, window, curtain, 


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

@アーティストタグ, masterpiece, best quality,

kasane teto (sv), red drill hair, grey jacket, layered skirt


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

プレビュー版だが 1girl の精度は高いので、事前学習を相当量こなしている可能性が高い。

sensitive, newest, year 2025, anime screenshot, 1girl, outdoors, school uniform, from below


ネガティブプロンプト

@厚塗り系アーティスト, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

クオリティタグ（masterpiece や score_9 など）があるとイラストっぽくなるので、アニメにしたい場合はクオリティタグを外す。

ModelSamplinlgAuraFlow の shift :６

idolmaster cinderella girls, (official art:1.5), (game cg:1.5), (realistic:0.4), masterpiece, highres, safe, indoors, window.

takagaki kaede, mature, flat chest, shorts, pantyhose, dark green dress, fringe trim, mole under eye under her right eye, :D, off shoulder, three-quarter sleeves. She has heterochromania that her left eye is green and her right eye is blue.


ネガティブプロンプト

@アーティストタグ, upper teeth only, 3d, bar censor, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, watermark, logo, text, shaded face, blacklighting

２段サンプラー（両方 Anima）で生成。

heterochromania の目の色やほくろの位置を固定できるのが便利。

服は再現できないので LoRa か編集モデルを使う。

@アーティスト名, 1girl, chibi, portrait, white background

frieren, capelet, floating earrings, gold trim, green eyes, pointy ears, sleeve cuffs, smile, striped shirt, finger to mouth, sideways glance, looking at viewer, half-closed eyes


ネガティブプロンプト

backlighting, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

@アーティストタグ, masterpiece, best quality, 2boys, 1girl, 

The three of them are lined up in a row.

# girl
The standing girl is wearing a school uniform, ahegao, double v.

# boy 1
A standing boy wearing a t-shirt and pants is expressionless and looking looking outside(left) on the left.

# boy 2
Another boy wearing a school swimsuit is looking at the girl with surprising, open mouth. He is standing on the right. crossed arms.

# background

indoors, classroom, chalkboard


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch

マークダウンの見出しの記号の # が漏れているので、マークダウンは理解できない可能性がある。

@アーティストタグ, masterpiece, best quality, 3girls, multiple views, brown long hair, indoors, classroom, chalkboard,

# Left
The standing girl is wearing a school uniform, v, smile.

# Middle
The girl is wearing a bra and a panties, embarrassed.

# Right
The angry girl wearing a school swimsuit. pointing at viewer. 


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

@アーティストタグ, masterpiece, best quality, 2koma, comic, 1girl, 1boy, indoors

# koma 1

1girl, solo, :D, seductive smile, brown eyes, long eyelashes, looking at viewer, portrait, collared shirt, suit jacket, id card, straight-on, head tilt

# koma 2

from side, nude, sex from behind, holding another's wrist, doggy style, torogao, perky breasts, ass, closed eyes


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

@アーティストタグ, highres, absurdres, explicit, uncensored, 3koma, multiple views, doujinshi, manga, comic, realistic, sound effects, motion lines, speech bubbles, heart \(symbol\), 1girl, 1boy, clothed female nude female,

A girl wearing a sweater with covered nipples, lace-trimmed panties, skindentation, full body on the right.

A girl is naked, perky breasts, large areolae, puffy nipples, upper body on the left.

A naked girl sex with faceless fat bald ugly bastard, cum in pussy on the left bottom.


ネガティブプロンプト

@アーティストタグ, censored, upper teeth only, 3d, sketch, flat color, bar censor, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, watermark, logo, text, shaded face, blacklighting,
english text, engrish text, chinese text, korean text

on the right や on the left bottom, on the right top のような指示がわりと効く。

LoRa 作成

ai-toolkit か sd-scripts が対応するまで待った方がいい。

LoRa 作成の成功報告があるのは diffusion-pipe。

The training script for the Anima model has already been implemented for sd-scripts #35

そのほかの情報

軽量モデルのテキストエンコーダー

	Z-Image (Turbo)	FLUX.2 Klein 4b	Newbie (Lumina-Image 系列)	Netayume (Lumina-Image 2.0)	Anima (Cosmos-Predict2-2B)
テキストエンコーダー	Qwen 3	Qwen 3	Gemma 3	Gemma 2	Qwen 3
テキストエンコーダーのパラメータ数	4b	4b	4b	2b	0.6b

circlestone-labs/Animaのライセンスによると、Anima は NVIDIA の Cosmos-Predict2-2B-Text2Image の派生モデル。

テキストエンコーダーは Qwen3 0.6B、VAE は Qwen-Image VAE。Cosmos-Predict2-2B-Text2Image はテキストエンコーダーは T5XXL、VAE は Wan2.1 VAE。

nvidia/Cosmos-Predict2-2B-Text2Image

Cosmos-Predict2 World Simulation Model for Physical AI

おそらく、ロボットの強化学習用に使われることを想定したモデル。学習データセットは車載・工場・厨房の動画データが多い。

テキストエンコーダーは T5 XXL のエンコーダーのみで、テキストエンコーダーのパラメータ数はおよそ 4.7B。SD3 も同じテキストエンコーダーを使っている。

VAE は Wan-AI/Wan2.1-T2V-1.3B-Diffusers。

プロンプトの推奨語数は 300 語以下で、ベース解像度は 1280 x 704。720 は 64 で割り切れないので、端数を切り捨てて 11 x 64 = 704。

Limitations

高解像度の画像を生成させようとするとアーティファクトがでる。